An upper bound on community size in scalable community detection 
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It is well-known that community detection methods based on modularity optimiza- 
tion often fails to discover small communities. Several objective functions used for 
community detection therefore involve a resolution parameter that allows the detec- 
tion of communities at different scales. We provide an explicit upper bound on the 
community size of communities resulting from the optimization of several of these 
functions. We also show with a simple example that the use of the resolution param- 
eter may artificially force the complete disaggregation of large and densely connected 
communities. 
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Many popular methods for detecting communities in networks are based on the optimiza- 
tion of the modularity function, which is a measure of the quality of a network partition into 
communities. The modularity of a partition compares the density of edges inside communi- 
ties to the corresponding density expected in a null modeP. It has been shown by Fortunato 
and Barthelemy^ that modularity suffers from a so-called resolution limit: modularity opti- 
mization methods often fail to identify small communities. 

Several authors have proposed objective functions for community detection that incorpo- 
rate a tunable resolution parameter so as to allow community detection at different scales. 
One such function introduced by Reichardt and Bornholdt in 200(P, can be written in the 
following form: 

s x ' 

In this expression, the sum is over the communities s, l s is the number of edges inside 
community s, d s is the sum of the degrees of the nodes in partition s and L is the total number 
of edges in the network. The case 7 = 1 corresponds to modularity for the configuration 
model as defined by NewmarP. Higher resolutions are obtained by choosing higher values 
for the resolution parameter 7 in Equation [T} Objective function that are mathematically 
equivalent to Q 7 have been proposed in a number of other contexts. In particular, Lambiotte 
et al. have showrP that the function Q 7 corresponds to the first-order approximation of a 
dynamical process driven by the Laplacian of the graph where the resolution parameter 
plays the role of a timescale. The function Q 7 is also a special case (for u = 0) of the 
function used by Mucha et alP to study so-called multislice networks. It has been shown 
by Kumpala et alP" that methods based on the optimization of Q 7 suffer from a resolution 
limit similar to the one reportecPfor 7 = 1. 

In this note, we show that any resolution parameter value 7 > 1 impose a non-trivial 
upper bound on the size of communities. To establish this bound, consider two communities 
whose node degrees sum to, respectively, di and d 2 and contain, respectively, li and l 2 internal 
edges. Let also e be the number of edges connecting the two communities. Compare now the 
situation where the communities are separate with the one where the two communities are 
merged into one. In the latter case, the total degree of the community is given by d = d\ + d 2 
and the total number of edges is equal to l\ + l 2 + e. An elementary calculation shows that 
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the difference in the objective function between these two situations is given by 




with separate communities leading to a larger value of the objective function when AQ < 0. 
Since e < di, we have 



and so AQ < when d2/(2L) > I/7. Thus, if one can find a set of nodes in a community 
whose total node degrees exceed I/7 of the total node degrees in the network, then the value 
of the objective function increases when making this set of nodes a separate community. This 
imposes a non-trivial upper bound on community sizes as soon as 7 > 1. In particular, a 
community of n nodes may not contain a fraction of the total degree (or of the total number 
of edges) larger than n/((n — 1)7). 

We now show with an example that the use of a resolution parameter may disaggregate 
large and densely connected communities. Consider the network consisting of a clique of 16 
nodes and of four cliques of 4 nodes each. There is one edge between the clique of 16 nodes 
and each of the cliques of 4 nodes. All pairs of cliques of 4 nodes are connected to each 
others with 2 edges (Figure 1). The partition of optimal modularity (7 = 1) consists of two 
communities of 16 nodes each, as shown on the left of Figure 1. This is a typical illustration 
of the resolution limit where modularity optimization fails to detect the four small cliques 
of 4 nodes. When 7 is increased to enable the detection of the smaller cliques, the larger 
clique of 16 nodes splits into 16 distinct communities of one node each (middle of Figure 1, 
7 = 1.5). As 7 is further increased, the second community finally splits into 4 cliques (right 
of Figure 1, 7 = 2). 

As this simple example clearly shows, when optimizing the objective function Q 7 for 
7 > 1 one should be aware of the tendency of the resulting optimum to disagregate large 
and dense communities and be cautious when interpreting the partitions obtained. 
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FIG. 1. Community partitioning of the same network for different values of the tunable resolution 
parameter 7 . On the left, the partition obtained for 7 = 1 consists of two communities of 16 nodes 
each ; this is a typical example of the resolution limit of modularity where modularity optimization 
fails to detect the four small cliques of 4 nodes. As the resolution parameter is increased to 7 = 
1.5, the method still fails to detect the four small cliques but the nodes in the large clique now 
form sixteen distinct one-node communities. When 7 = 2 (right) the four small cliques are finally 
separated. 
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